
Click here for a larger
image.
Several password spy tutorials have been posted to
CodeGuru, but all of them rely on Windows hooks. Is there any
other way to make such a utility? Yes, there is. But first, let me
review the problem briefly, just to make sure we're all on the same
page.
To "read" the contents of any control—either belonging to your
application or not—you generally send the WM_GETTEXT
message to it. This also applies to edit controls, except in one
special case. If the edit control belongs to another process and the
ES_PASSWORD style is set, this approach fails. Only the
process that "owns" the password control can get its contents via
WM_GETTEXT. So, our problem reduces to the following:
How to get
::SendMessage( hPwdEdit, WM_GETTEXT, nMaxChars, psBuffer );
executed in the address space of another process.
In general, there are three possibilities to solve this
problem:
- Put your code into a DLL; then, map the DLL to the remote
process via windows
hooks.
- Put your code into a DLL and map the DLL to the remote process
using the CreateRemoteThread
& LoadLibrary technique.
- Instead of writing a separate DLL, copy your code to the
remote process directly—via
WriteProcessMemory—and
start its execution with CreateRemoteThread. A
detailed description of this technique can be found here.
I. Windows Hooks
Demo applications: HookSpy
and HookInjEx
The primary role of Windows hooks is to monitor the message
traffic of some thread. In general, there are:
- Local hooks, where you monitor the message traffic of
any thread belonging to your process.
- Remote hooks, which can be:
- thread-specific, to monitor the message traffic of a
thread belonging to another process;
- system-wide, to monitor the message traffic for all
threads currently running on the system.
If the hooked thread belongs to another process (cases 2a &
2b), your hook procedure must reside in a dynamic-link library
(DLL). The system then maps the DLL containing the hook procedure
into the address space of the hooked thread. Windows will map the
entire DLL, not just the hook procedure. That is why Windows hooks
can be used to inject code into another process's address space.
While I won't discuss hooks in this article further (take a look
at the SetWindowHookEx API in MSDN for more details),
let me give you two more hints that you won't find in the
documentation, but might still be useful:
- After a successful call to
SetWindowsHookEx, the
system maps the DLL into the address space of the hooked thread
automatically, but not necessary immediately. Because Windows
hooks are all about messages, the DLL isn't really mapped until an
adequate event happens. For example:
|
If you install a hook that monitors all nonqueued
messages of some thread (WH_CALLWNDPROC), the
DLL won't be mapped into the remote process until a message
is actually sent to (some window of) the hooked thread. In
other words, if UnhookWindowsHook is called
before a message was sent to the hooked thread, the DLL will
never be mapped into the remote process (although the call
to SetWindowsHookEx itself succeeded). To force
an immediate mapping, send an appropriate event to the
concerned thread right after the call to
SetWindowsHookEx. |
The same is true for unmapping the DLL after calling
UnhookWindowsHook. The DLL isn't really unmapped
until an adequate event happens.
- When you install hooks, they can affect the overall system
performance (especially system-wide hooks). However, you can
easily overcome this shortcoming if you use thread-specific
hooks solely as a DLL mapping mechanism, and not to trap messages.
Consider the following code snippet:
BOOL APIENTRY DllMain( HANDLE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved )
{
if( ul_reason_for_call == DLL_PROCESS_ATTACH )
{
char lib_name[MAX_PATH];
::GetModuleFileName( hDll, lib_name, MAX_PATH );
::LoadLibrary( lib_name );
::UnhookWindowsHookEx( g_hHook );
}
return TRUE;
}
So, what happens?
First, we map the DLL to the remote
process via Windows hooks. Then, right after the DLL has actually
been mapped, we unhook it. Normally, the DLL would be unmapped
now, too, as soon as the first message to the hooked thread would
arrive. The dodgy thing is we prevent this unmapping by
increasing the DLLs reference count via
LoadLibrary.
The question that remains is: How to unload the DLL now, once
we are finished? UnhookWindowsHookEx won't do it
because we unhooked the thread already. You could do it this
way:
- Install another hook, just before you want to unmap the DLL;
- Send a "special" message to the remote thread;
- Catch this message in your hook procedure; in response, call
FreeLibrary & UnhookWindowsHookEx.
Now, hooks are used only while mapping/unmapping the DLL
to/from the remote process; there is no influence on the
performance of the "hooked" thread in the meantime. Put another
way: We get a DLL mapping mechanism that doesn't interfere the
target process more than the LoadLibrary technique
discussed below does (see Section
II.). However, opposed to the LoadLibrary
technique, this solution works on both WinNT and Win9x.
But, when should one use this trick?
Always when the DLL has
to be present in the remote process for a longer period of time
(i.e. if you subclass a control belonging to another process) and
you want to interfere the target process as little as possible. I
didn't use it in HookSpy because the DLL there is injected just
for a moment—just long enough to get the password. I rather
provided another example—HookInjEx—to demonstrate it. HookInjEx
maps/unmaps a DLL into "explorer.exe", where it subclasses the
Start button. More precisely: It swaps the left and right mouse
clicks for the Start button.
You will find HookSpy and HookInjEx as well as their sources in
the download package at
the end of the article.
II. The CreateRemoteThread & LoadLibrary
Technique
Demo application: LibSpy
In general, any process can load a DLL dynamically by using the
LoadLibrary API. But, how do we force an external
process to call this function? The answer is
CreateRemoteThread.
Let's take a look at the declaration of the
LoadLibrary and FreeLibrary APIs
first:
HINSTANCE LoadLibrary(
LPCTSTR lpLibFileName
);
BOOL FreeLibrary(
HMODULE hLibModule
);
Now, compare them with the declaration of
ThreadProc—the thread routine—passed to
CreateRemoteThread:
DWORD WINAPI ThreadProc(
LPVOID lpParameter
);
As you can see, all functions use the same calling convention and
all accept a 32-bit parameter. Also, the size of the returned value
is the same. In other words: We may pass a pointer to
LoadLibrary/FreeLibrary as the thread routine to
CreateRemoteThread.
However, there are two problems (see the description for
CreateRemoteThread below):
- The
lpStartAddress parameter in
CreateRemoteThread must represent the starting
address of the thread routine in the remote process.
- If
lpParameter—the parameter passed to
ThreadFunc—is interpreted as an ordinary 32-bit value
(FreeLibrary interprets it as an
HMODULE), everything is fine. However, if
lpParameter is interpreted as a pointer
(LoadLibraryA interprets it as a pointer to a
char string), it must point to some data in the
remote process.
The first problem is actually solved by itself. Both
LoadLibrary and FreeLibray are functions
residing in kernel32.dll. Because kernel32.dll is
guaranteed to be present and at the same load address in every
"normal" process (see Appendix
A), the address of LoadLibrary/FreeLibray is the
same in every process too. This ensures that a valid pointer is
passed to the remote process.
The second problem is also easy to solve: Simply copy the DLL
module name (needed by LoadLibrary) to the remote
process via WriteProcessMemory.
So, to use the CreateRemoteThread & LoadLibrary
technique, follow these steps:
- Retrieve a
HANDLE to the remote process
(OpenProcess).
- Allocate memory for the DLL name in the remote process
(
VirtualAllocEx).
- Write the DLL name, including full path, to the
allocated memory (
WriteProcessMemory).
- Map your DLL into the remote process via
CreateRemoteThread & LoadLibrary.
- Wait until the remote thread terminates
(
WaitForSingleObject); this is until the call to
LoadLibrary returns. Put another way, the thread will
terminate as soon as our DllMain (called with reason
DLL_PROCESS_ATTACH) returns.
- Retrieve the exit code of the remote thread
(
GetExitCodeThread). Note that this is the value
returned by LoadLibrary, thus the base address
(HMODULE) of our mapped DLL.
- Free the memory allocated in Step #2
(
VirtualFreeEx).
- Unload the DLL from the remote process via
CreateRemoteThread & FreeLibrary. Pass the
HMODULE handle retreived in Step #6 to
FreeLibrary (via lpParameter in
CreateRemoteThread).
Note: If your injected DLL
spawns any new threads, be sure they are all terminated before
unloading it.
- Wait until the thread terminates
(
WaitForSingleObject).
Also, don't forget to close all the handles once you are
finished: To both threads, created in Steps #4 and #8; and the
handle to the remote process, retrieved in Step #1.
Let's examine some parts of LibSpy's sources now, to see how the
above steps are implemented in reality. For the sake of simplicity,
error handling and unicode support are removed.
HANDLE hThread;
char szLibPath[_MAX_PATH];
void* pLibRemote;
DWORD hLibModule;
pLibRemote = ::VirtualAllocEx( hProcess, NULL, sizeof(szLibPath),
MEM_COMMIT, PAGE_READWRITE );
::WriteProcessMemory( hProcess, pLibRemote, (void*)szLibPath,
sizeof(szLibPath),NULL );
hThread = ::CreateRemoteThread( hProcess, NULL, 0,
(LPTHREAD_START_ROUTINE )::GetProcAddress(
::GetModuleHandle("Kernel32"), "LoadLibraryA"),
pLibRemote, 0, NULL );
::WaitForSingleObject( hThread, INFINITE );
::GetExitCodeThread( hThread, &hLibModule );
::CloseHandle( hThread );
::VirtualFreeEx( hProcess, pLibRemote,
sizeof(szLibPath),MEM_RELEASE );
Assume our SendMessage—the code that we actually
wanted to inject—was placed in DllMain
(DLL_PROCESS_ATTACH), so it has already been executed
by now. Then, it is time to unload the DLL from the target
process:
hThread = ::CreateRemoteThread( hProcess, NULL, 0,
(LPTHREAD_START_ROUTINE )::GetProcAddress(
::GetModuleHandle("Kernel32"), "FreeLibrary"),
(void*)hLibModule,
0, NULL );
::WaitForSingleObject( hThread, INFINITE );
::CloseHandle( hThread );
Interprocess Communications
Until now, we only talked about how to inject the DLL into the
remote process. However, in most situations the injected DLL will
need to communicate with your original application in some way
(recall that the DLL is mapped into some remote process now, not to
our local application!). Take our Password Spy: The DLL has to know
the handle to the control that actually contains the password.
Obviously, this value can't be hard-coded into it at compile time.
Similarly, once the DLL gets the password, it has to send it back to
our application so we can display it appropriately.
Fortunately, there are many ways to deal with this situation:
File Mapping, WM_COPYDATA, the Clipboard, and the
sometimes very handy #pragma data_seg, to name just a
few. I won't describe these techniques here because they are all
well documented either in MSDN (see Interprocess Communications) or
in other tutorials. Anyway, I used solely the #pragma
data_seg in the LibSpy example.
You will find LibSpy and its sources in the download package at
the end of the article.
III. The CreateRemoteThread & WriteProcessMemory
Technique
Demo application: WinSpy
Another way to copy some code to another process's address space
and then execute it in the context of this process involves the use
of remote threads and the WriteProcessMemory API.
Instead of writing a separate DLL, you copy the code to the remote
process directly now—via WriteProcessMemory—and start
its execution with CreateRemoteThread.
Let's take a look at the declaration of
CreateRemoteThread first:
HANDLE CreateRemoteThread(
HANDLE hProcess,
LPSECURITY_ATTRIBUTES lpThreadAttributes,
DWORD dwStackSize,
LPTHREAD_START_ROUTINE lpStartAddress,
LPVOID lpParameter,
DWORD dwCreationFlags,
LPDWORD lpThreadId
);
If you compare it to the declaration of CreateThread
(MSDN), you will notice the following differences:
- The
hProcess parameter is additional in
CreateRemoteThread. It is the handle to the process
in which the thread is to be created.
- The
lpStartAddress parameter in
CreateRemoteThread represents the starting address of
the thread in the remote processes address space. The function
must exist in the remote process, so we can't simply pass a
pointer to the local ThreadFunc. We have to copy the
code to the remote process first.
- Similarly, the data pointed to by
lpParameter must exist in the remote process,
so we have to copy it there, too.
Now, we can summarize this technique in the following steps:
- Retrieve a
HANDLE to the remote process
(OpenProces).
- Allocate memory in the remote process's address space for
injected data (
VirtualAllocEx).
- Write a copy of the initialised
INJDATA structure
to the allocated memory (WriteProcessMemory).
- Allocate memory in the remote process's address space for
injected code.
- Write a copy of
ThreadFunc to the allocated
memory.
- Start the remote copy of
ThreadFunc via
CreateRemoteThread.
- Wait until the remote thread terminates
(
WaitForSingleObject).
- Retrieve the result from the remote process
(
ReadProcessMemory or
GetExitCodeThread).
- Free the memory allocated in Steps #2 and #4
(
VirtualFreeEx).
- Close the handles retrieved in Steps #6 and #1
(
CloseHandle).
Additional caveats that ThreadFunc has to obey:
ThreadFunc should not call any functions besides
those in kernel32.dll and user32.dll;
only kernel32 and user32 are, if present (note that
user32 isn't mapped into every Win32 process!), guaranteed
to be at the same load address in both the local and the target
process (see Appendix
A). If you need functions from other libraries, pass the
addresses of LoadLibrary and
GetProcAddress to the injected code, and let it go
and get the rest itself. You could also use
GetModuleHandle instead of LoadLibrary,
if for one or another reason the debatable DLL is already mapped
into the target process.
Similarly, if you want to call your
own subroutines from within ThreadFunc, copy each
routine to the remote process individually and supply their
addresses to ThreadFunc via INJDATA.
- Don't use any static strings. Rather pass all strings to
ThreadFunc via INJDATA.
Why? The
compiler puts all static strings into the ".data" section of an
executable and only references (=pointers) remain in the code.
Then, the copy of ThreadFunc in the remote process
would point to something that doesn't exist (at least not in its
address space).
- Remove the /GZ compiler switch; it is set by default in debug
builds (see Appendix
B).
- Either declare
ThreadFunc and
AfterThreadFunc as static or disable
incremental linking (see Appendix
C).
- There must be less than a page-worth (4 Kb) of local variables
in
ThreadFunc (see Appendix
D). Note that in debug builds some 10 bytes of the available 4
Kb are used for internal variables.
- If you have a
switch block with more than three
case statements, either split it up like this:
switch( expression ) {
case constant1: statement1; goto END;
case constant2: statement2; goto END;
case constant3: statement2; goto END;
}
switch( expression ) {
case constant4: statement4; goto END;
case constant5: statement5; goto END;
case constant6: statement6; goto END;
}
END:
or modify it into an if-else if sequence (see
Appendix
E).
- ...
You will almost certainly crash the target process if you don't
play by those rules. Just remember: Don't assume anything in
the target process is at the same address as it is in your process
(see Appendix
F).
GetWindowTextRemote(A/W)
All the functionality you need to get the password from a
"remote" edit control is encapsulated in
GetWindowTextRemot(A/W):
int GetWindowTextRemoteA( HANDLE hProcess, HWND hWnd, LPSTR
lpString );
int GetWindowTextRemoteW( HANDLE hProcess, HWND hWnd, LPWSTR
lpString );
Parameters
- hProcess
- Handle to the process the edit control belongs to.
- hWnd
- Handle to the edit control containing the password.
- lpString
- Pointer to the buffer that is to receive the text.
Return Value
The return value is the number of characters copied.
Let's examine some parts of its sources now—especially the
injected data and code—to see how GetWindowTextRemote
works. Again, unicode support is removed for the sake of
simplicity.
INJDATA
typedef LRESULT (WINAPI *SENDMESSAGE)(HWND,UINT,WPARAM,LPARAM);
typedef struct {
HWND hwnd;
SENDMESSAGE fnSendMessage;
char psText[128];
} INJDATA;
INJDATA is the data structure being injected into
the remote process. However, before doing so the structure's pointer
to SendMessageA is initialised in our application. The
dodgy thing here is that user32.dll is (if present!)
always mapped to the same address in every process; thus, the
address of SendMessageA is always the same, too. This
ensures that a valid pointer is passed to the remote process.
ThreadFunc
static DWORD WINAPI ThreadFunc (INJDATA *pData)
{
pData->fnSendMessage( pData->hwnd, WM_GETTEXT,
sizeof(pData->psText),
(LPARAM)pData->psText );
return 0;
}
static void AfterThreadFunc (void)
{
}
ThradFunc is the code executed by the remote thread.
Point of interest:
- Note how
AfterThreadFunc is used to calculate the
code size of ThreadFunc. In general this isn't the
best idea, because the linker is free to change the order of your
functions (i.e. it could place ThreadFunc behind
AfterThreadFunc). However, you can be pretty sure
that in small projects, like our WinSpy is, the order of your
functions will be preserved. If necessary, you also could use the
/ORDER linker option to help you out; or yet better: Determine the
size of ThreadFunc with a dissasembler.
How to Subclass a Remote Control with This Technique
Demo
application: InjectEx
Let's explain something more complicated now: how to subclass a
control belonging to another process with this technique.
First of all, note that you have to copy two functions to the
remote process to accomplish this task:
ThreadFunc, which actually subclasses the control
in the remote process via SetWindowLong, and
NewProc, the new window procedure of the
subclassed control.
However, the main problem is how to
pass data to the remote NewProc. Because
NewProc is a callback function and thus has to conform
to specific guidelines, we can't simply pass a pointer to
INJDATA to it as an argument. Fortunately, there are
other ways to solve this problem (I found two), but all rely on the
assembly language. So, when I tried to preserve the assembly for the
appendixes until now, it won't go without it this time.
Solution 1
Observe the following picture:

Figure 2: The virtual address space
Note that INJDATA is placed immediately before
NewProc in the remote process? This way
NewProc knows the memory location of
INJDATA in the remote processes address space at
compile time. More precisely: It knows the address of
INJDATA relative to its own location, but that's
actually all we need. Now NewProc might look like this:
static LRESULT CALLBACK NewProc(
HWND hwnd,
UINT uMsg,
WPARAM wParam,
LPARAM lParam )
{
INJDATA* pData = (INJDATA*) NewProc;
pData--;
return pData->fnCallWindowProc( pData->fnOldProc,
hwnd,uMsg,wParam,lParam );
}
However, there is still a problem. Observe the first line:
INJDATA* pData = (INJDATA*) NewProc;
This way, a hard-coded value (the memory location of the original
NewProc in our process) will be arranged to
pData. That is not quite what we want: The memory
location of the "current" copy of NewProc in the remote
process, regardless of to what location it is (NewProc)
actually moved. In other words, we would need some kind of a "this
pointer."
While there is no way to solve this in C/C++, it can be done with
inline assembly. Consider the modified NewProc:
static LRESULT CALLBACK NewProc(
HWND hwnd,
UINT uMsg,
WPARAM wParam,
LPARAM lParam )
{
INJDATA* pData;
_asm {
call dummy
dummy:
pop ecx
sub ecx, 9
mov pData, ecx
}
pData--;
return pData->fnCallWindowProc( pData->fnOldProc,
hwnd,uMsg,wParam,lParam );
}
So, what's going on?
Virtually every processor has a special
register that points to the memory location of the next instruction
to be executed. That's the so-called instruction pointer, denoted
EIP on 32-bit Intel and AMD processors. Because EIP is a
special-purpose register, you can't access it programmatically as
you can general purpose registers (EAX, EBX, etc). Put another way:
There is no OpCode, with which you could address EIP and read or
change its contents explicitly. However, EIP can still be changed
(and is changed all the time) implicitly, by instructions such as
JMP, CALL and RET. Let's, for
example, explain how the subroutine CALL/RET mechanism
works on 32-bit Intel and AMD processors: